235 research outputs found

    The Conditional Lucas & Kanade Algorithm

    Full text link
    The Lucas & Kanade (LK) algorithm is the method of choice for efficient dense image and object alignment. The approach is efficient as it attempts to model the connection between appearance and geometric displacement through a linear relationship that assumes independence across pixel coordinates. A drawback of the approach, however, is its generative nature. Specifically, its performance is tightly coupled with how well the linear model can synthesize appearance from geometric displacement, even though the alignment task itself is associated with the inverse problem. In this paper, we present a new approach, referred to as the Conditional LK algorithm, which: (i) directly learns linear models that predict geometric displacement as a function of appearance, and (ii) employs a novel strategy for ensuring that the generative pixel independence assumption can still be taken advantage of. We demonstrate that our approach exhibits superior performance to classical generative forms of the LK algorithm. Furthermore, we demonstrate its comparable performance to state-of-the-art methods such as the Supervised Descent Method with substantially less training examples, as well as the unique ability to "swap" geometric warp functions without having to retrain from scratch. Finally, from a theoretical perspective, our approach hints at possible redundancies that exist in current state-of-the-art methods for alignment that could be leveraged in vision systems of the future.Comment: 17 pages, 11 figure

    CubeNet: Equivariance to 3D Rotation and Translation

    Full text link
    3D Convolutional Neural Networks are sensitive to transformations applied to their input. This is a problem because a voxelized version of a 3D object, and its rotated clone, will look unrelated to each other after passing through to the last layer of a network. Instead, an idealized model would preserve a meaningful representation of the voxelized object, while explaining the pose-difference between the two inputs. An equivariant representation vector has two components: the invariant identity part, and a discernable encoding of the transformation. Models that can't explain pose-differences risk "diluting" the representation, in pursuit of optimizing a classification or regression loss function. We introduce a Group Convolutional Neural Network with linear equivariance to translations and right angle rotations in three dimensions. We call this network CubeNet, reflecting its cube-like symmetry. By construction, this network helps preserve a 3D shape's global and local signature, as it is transformed through successive layers. We apply this network to a variety of 3D inference problems, achieving state-of-the-art on the ModelNet10 classification challenge, and comparable performance on the ISBI 2012 Connectome Segmentation Benchmark. To the best of our knowledge, this is the first 3D rotation equivariant CNN for voxel representations.Comment: Preprin

    Culture shapes how we look at faces

    Get PDF
    Background: Face processing, amongst many basic visual skills, is thought to be invariant across all humans. From as early as 1965, studies of eye movements have consistently revealed a systematic triangular sequence of fixations over the eyes and the mouth, suggesting that faces elicit a universal, biologically-determined information extraction pattern. Methodology/Principal Findings: Here we monitored the eye movements of Western Caucasian and East Asian observers while they learned, recognized, and categorized by race Western Caucasian and East Asian faces. Western Caucasian observers reproduced a scattered triangular pattern of fixations for faces of both races and across tasks. Contrary to intuition, East Asian observers focused more on the central region of the face. Conclusions/Significance: These results demonstrate that face processing can no longer be considered as arising from a universal series of perceptual events. The strategy employed to extract visual information from faces differs across cultures

    Saccadic facilitation by modulation of microsaccades in natural backgrounds

    Get PDF
    Saccades move objects of interest into the center of the visual field for high-acuity visual analysis. White, Stritzke, and Gegenfurtner (Current Biology, 18, 124–128, 2008) have shown that saccadic latencies in the context of a structured background are much shorter than those with an unstructured background at equal levels of visibility. This effect has been explained by possible preactivation of the saccadic circuitry whenever a structured background acts as a mask for potential saccade targets. Here, we show that background textures modulate rates of microsaccades during visual fixation. First, after a display change, structured backgrounds induce a stronger decrease of microsaccade rates than do uniform backgrounds. Second, we demonstrate that the occurrence of a microsaccade in a critical time window can delay a subsequent saccadic response. Taken together, our findings suggest that microsaccades contribute to the saccadic facilitation effect, due to a modulation of microsaccade rates by properties of the background

    Optimal measurement of visual motion across spatial and temporal scales

    Full text link
    Sensory systems use limited resources to mediate the perception of a great variety of objects and events. Here a normative framework is presented for exploring how the problem of efficient allocation of resources can be solved in visual perception. Starting with a basic property of every measurement, captured by Gabor's uncertainty relation about the location and frequency content of signals, prescriptions are developed for optimal allocation of sensors for reliable perception of visual motion. This study reveals that a large-scale characteristic of human vision (the spatiotemporal contrast sensitivity function) is similar to the optimal prescription, and it suggests that some previously puzzling phenomena of visual sensitivity, adaptation, and perceptual organization have simple principled explanations.Comment: 28 pages, 10 figures, 2 appendices; in press in Favorskaya MN and Jain LC (Eds), Computer Vision in Advanced Control Systems using Conventional and Intelligent Paradigms, Intelligent Systems Reference Library, Springer-Verlag, Berli

    Adaptive Filtering Enhances Information Transmission in Visual Cortex

    Full text link
    Sensory neuroscience seeks to understand how the brain encodes natural environments. However, neural coding has largely been studied using simplified stimuli. In order to assess whether the brain's coding strategy depend on the stimulus ensemble, we apply a new information-theoretic method that allows unbiased calculation of neural filters (receptive fields) from responses to natural scenes or other complex signals with strong multipoint correlations. In the cat primary visual cortex we compare responses to natural inputs with those to noise inputs matched for luminance and contrast. We find that neural filters adaptively change with the input ensemble so as to increase the information carried by the neural response about the filtered stimulus. Adaptation affects the spatial frequency composition of the filter, enhancing sensitivity to under-represented frequencies in agreement with optimal encoding arguments. Adaptation occurs over 40 s to many minutes, longer than most previously reported forms of adaptation.Comment: 20 pages, 11 figures, includes supplementary informatio

    Longer fixation duration while viewing face images

    Get PDF
    The spatio-temporal properties of saccadic eye movements can be influenced by the cognitive demand and the characteristics of the observed scene. Probably due to its crucial role in social communication, it is argued that face perception may involve different cognitive processes compared with non-face object or scene perception. In this study, we investigated whether and how face and natural scene images can influence the patterns of visuomotor activity. We recorded monkeys’ saccadic eye movements as they freely viewed monkey face and natural scene images. The face and natural scene images attracted similar number of fixations, but viewing of faces was accompanied by longer fixations compared with natural scenes. These longer fixations were dependent on the context of facial features. The duration of fixations directed at facial contours decreased when the face images were scrambled, and increased at the later stage of normal face viewing. The results suggest that face and natural scene images can generate different patterns of visuomotor activity. The extra fixation duration on faces may be correlated with the detailed analysis of facial features

    Local biases drive, but do not determine, the perception of illusory trajectories

    Get PDF
    When a dot moves horizontally across a set of tilted lines of alternating orientations, the dot appears to be moving up and down along its trajectory. This perceptual phenomenon, known as the slalom illusion, reveals a mismatch between the veridical motion signals and the subjective percept of the motion trajectory, which has not been comprehensively explained. In the present study, we investigated the empirical boundaries of the slalom illusion using psychophysical methods. The phenomenon was found to occur both under conditions of smooth pursuit eye movements and constant fixation, and to be consistently amplified by intermittently occluding the dot trajectory. When the motion direction of the dot was not constant, however, the stimulus display did not elicit the expected illusory percept. These findings confirm that a local bias towards perpendicularity at the intersection points between the dot trajectory and the tilted lines cause the illusion, but also highlight that higher-level cortical processes are involved in interpreting and amplifying the biased local motion signals into a global illusion of trajectory perception

    Local biases drive, but do not determine, the perception of illusory trajectories

    Get PDF
    When a dot moves horizontally across a set of tilted lines of alternating orientations, the dot appears to be moving up and down along its trajectory. This perceptual phenomenon, known as the slalom illusion, reveals a mismatch between the veridical motion signals and the subjective percept of the motion trajectory, which has not been comprehensively explained. In the present study, we investigated the empirical boundaries of the slalom illusion using psychophysical methods. The phenomenon was found to occur both under conditions of smooth pursuit eye movements and constant fixation, and to be consistently amplified by intermittently occluding the dot trajectory. When the motion direction of the dot was not constant, however, the stimulus display did not elicit the expected illusory percept. These findings confirm that a local bias towards perpendicularity at the intersection points between the dot trajectory and the tilted lines cause the illusion, but also highlight that higher-level cortical processes are involved in interpreting and amplifying the biased local motion signals into a global illusion of trajectory perception
    • …
    corecore